Overview

Dataset statistics

Number of variables15
Number of observations35952
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory16.8 MiB
Average record size in memory488.9 B

Variable types

Categorical6
Numeric9

Alerts

Make has a high cardinality: 127 distinct values High cardinality
Model has a high cardinality: 3608 distinct values High cardinality
Engine Displacement is highly correlated with Cylinders and 8 other fieldsHigh correlation
Cylinders is highly correlated with Engine Displacement and 8 other fieldsHigh correlation
Fuel Barrels/Year is highly correlated with Engine Displacement and 9 other fieldsHigh correlation
City MPG is highly correlated with Engine Displacement and 8 other fieldsHigh correlation
Highway MPG is highly correlated with Engine Displacement and 8 other fieldsHigh correlation
Combined MPG is highly correlated with Engine Displacement and 8 other fieldsHigh correlation
CO2 Emission Grams/Mile is highly correlated with Engine Displacement and 9 other fieldsHigh correlation
Fuel Cost/Year is highly correlated with Engine Displacement and 7 other fieldsHigh correlation
Year is highly correlated with Transmission and 1 other fieldsHigh correlation
Transmission is highly correlated with Year and 10 other fieldsHigh correlation
Drivetrain is highly correlated with Transmission and 1 other fieldsHigh correlation
Vehicle Class is highly correlated with Year and 10 other fieldsHigh correlation
Fuel Type is highly correlated with Transmission and 2 other fieldsHigh correlation

Reproduction

Analysis started2022-11-01 08:31:12.219039
Analysis finished2022-11-01 08:31:21.764970
Duration9.55 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

Make
Categorical

HIGH CARDINALITY

Distinct127
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
Chevrolet
3643 
Ford
2946 
Dodge
2360 
GMC
2347 
Toyota
 
1836
Other values (122)
22820 

Length

Max length34
Median length27
Mean length6.321289497
Min length3

Characters and Unicode

Total characters227263
Distinct characters52
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24 ?
Unique (%)0.1%

Sample

1st rowAM General
2nd rowAM General
3rd rowAM General
4th rowAM General
5th rowASC Incorporated

Common Values

ValueCountFrequency (%)
Chevrolet3643
 
10.1%
Ford2946
 
8.2%
Dodge2360
 
6.6%
GMC2347
 
6.5%
Toyota1836
 
5.1%
BMW1677
 
4.7%
Mercedes-Benz1284
 
3.6%
Nissan1253
 
3.5%
Volkswagen1047
 
2.9%
Mitsubishi950
 
2.6%
Other values (117)16609
46.2%

Length

2022-11-01T09:31:21.812708image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
chevrolet3643
 
9.9%
ford2946
 
8.0%
dodge2360
 
6.4%
gmc2349
 
6.4%
toyota1836
 
5.0%
bmw1680
 
4.6%
mercedes-benz1284
 
3.5%
nissan1253
 
3.4%
volkswagen1047
 
2.9%
mitsubishi950
 
2.6%
Other values (167)17322
47.2%

Most occurring characters

ValueCountFrequency (%)
e22251
 
9.8%
o20505
 
9.0%
r14426
 
6.3%
a14328
 
6.3%
i11206
 
4.9%
d10984
 
4.8%
s10114
 
4.5%
l9391
 
4.1%
t9333
 
4.1%
u9103
 
4.0%
Other values (42)95622
42.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter177779
78.2%
Uppercase Letter47223
 
20.8%
Dash Punctuation1447
 
0.6%
Space Separator719
 
0.3%
Other Punctuation95
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e22251
12.5%
o20505
11.5%
r14426
 
8.1%
a14328
 
8.1%
i11206
 
6.3%
d10984
 
6.2%
s10114
 
5.7%
l9391
 
5.3%
t9333
 
5.2%
u9103
 
5.1%
Other values (14)46138
26.0%
Uppercase Letter
ValueCountFrequency (%)
M8417
17.8%
C7240
15.3%
B3639
 
7.7%
F3213
 
6.8%
G2521
 
5.3%
D2458
 
5.2%
P2259
 
4.8%
S2126
 
4.5%
T1868
 
4.0%
V1781
 
3.8%
Other values (14)11701
24.8%
Other Punctuation
ValueCountFrequency (%)
.85
89.5%
,10
 
10.5%
Dash Punctuation
ValueCountFrequency (%)
-1447
100.0%
Space Separator
ValueCountFrequency (%)
719
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin225002
99.0%
Common2261
 
1.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e22251
 
9.9%
o20505
 
9.1%
r14426
 
6.4%
a14328
 
6.4%
i11206
 
5.0%
d10984
 
4.9%
s10114
 
4.5%
l9391
 
4.2%
t9333
 
4.1%
u9103
 
4.0%
Other values (38)93361
41.5%
Common
ValueCountFrequency (%)
-1447
64.0%
719
31.8%
.85
 
3.8%
,10
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII227263
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e22251
 
9.8%
o20505
 
9.0%
r14426
 
6.3%
a14328
 
6.3%
i11206
 
4.9%
d10984
 
4.8%
s10114
 
4.5%
l9391
 
4.1%
t9333
 
4.1%
u9103
 
4.0%
Other values (42)95622
42.1%

Model
Categorical

HIGH CARDINALITY

Distinct3608
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Memory size2.3 MiB
F150 Pickup 2WD
 
197
F150 Pickup 4WD
 
179
Truck 2WD
 
173
Mustang
 
170
Jetta
 
169
Other values (3603)
35064 

Length

Max length47
Median length34
Mean length11.33708834
Min length1

Characters and Unicode

Total characters407591
Distinct characters69
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique631 ?
Unique (%)1.8%

Sample

1st rowDJ Po Vehicle 2WD
2nd rowFJ8c Post Office
3rd rowPost Office DJ5 2WD
4th rowPost Office DJ8 2WD
5th rowGNX

Common Values

ValueCountFrequency (%)
F150 Pickup 2WD197
 
0.5%
F150 Pickup 4WD179
 
0.5%
Truck 2WD173
 
0.5%
Mustang170
 
0.5%
Jetta169
 
0.5%
Ranger Pickup 2WD161
 
0.4%
Sierra 1500 4WD149
 
0.4%
Sierra 1500 2WD146
 
0.4%
Camaro146
 
0.4%
Civic139
 
0.4%
Other values (3598)34323
95.5%

Length

2022-11-01T09:31:21.876270image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2wd6921
 
9.3%
4wd4913
 
6.6%
pickup3011
 
4.1%
awd1929
 
2.6%
wagon1908
 
2.6%
15001159
 
1.6%
fwd1029
 
1.4%
convertible970
 
1.3%
van666
 
0.9%
coupe665
 
0.9%
Other values (1814)51018
68.8%

Most occurring characters

ValueCountFrequency (%)
38290
 
9.4%
a26703
 
6.6%
r23857
 
5.9%
e21668
 
5.3%
o18715
 
4.6%
W17524
 
4.3%
D16715
 
4.1%
i16534
 
4.1%
n15592
 
3.8%
014424
 
3.5%
Other values (59)197569
48.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter212103
52.0%
Uppercase Letter96298
23.6%
Decimal Number55267
 
13.6%
Space Separator38290
 
9.4%
Other Punctuation2877
 
0.7%
Dash Punctuation1088
 
0.3%
Open Punctuation831
 
0.2%
Close Punctuation831
 
0.2%
Math Symbol6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a26703
12.6%
r23857
11.2%
e21668
10.2%
o18715
 
8.8%
i16534
 
7.8%
n15592
 
7.4%
t11821
 
5.6%
u10125
 
4.8%
c8634
 
4.1%
l8008
 
3.8%
Other values (16)50446
23.8%
Uppercase Letter
ValueCountFrequency (%)
W17524
18.2%
D16715
17.4%
C10067
10.5%
S8602
8.9%
P4618
 
4.8%
A4610
 
4.8%
T4160
 
4.3%
F3833
 
4.0%
V2978
 
3.1%
E2956
 
3.1%
Other values (16)20235
21.0%
Decimal Number
ValueCountFrequency (%)
014424
26.1%
210697
19.4%
58506
15.4%
47624
13.8%
15908
10.7%
33573
 
6.5%
61590
 
2.9%
91131
 
2.0%
8954
 
1.7%
7860
 
1.6%
Other Punctuation
ValueCountFrequency (%)
/2639
91.7%
.238
 
8.3%
Space Separator
ValueCountFrequency (%)
38290
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1088
100.0%
Open Punctuation
ValueCountFrequency (%)
(831
100.0%
Close Punctuation
ValueCountFrequency (%)
)831
100.0%
Math Symbol
ValueCountFrequency (%)
>6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin308401
75.7%
Common99190
 
24.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a26703
 
8.7%
r23857
 
7.7%
e21668
 
7.0%
o18715
 
6.1%
W17524
 
5.7%
D16715
 
5.4%
i16534
 
5.4%
n15592
 
5.1%
t11821
 
3.8%
u10125
 
3.3%
Other values (42)129147
41.9%
Common
ValueCountFrequency (%)
38290
38.6%
014424
 
14.5%
210697
 
10.8%
58506
 
8.6%
47624
 
7.7%
15908
 
6.0%
33573
 
3.6%
/2639
 
2.7%
61590
 
1.6%
91131
 
1.1%
Other values (7)4808
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII407591
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
38290
 
9.4%
a26703
 
6.6%
r23857
 
5.9%
e21668
 
5.3%
o18715
 
4.6%
W17524
 
4.3%
D16715
 
4.1%
i16534
 
4.1%
n15592
 
3.8%
014424
 
3.5%
Other values (59)197569
48.5%

Year
Real number (ℝ≥0)

HIGH CORRELATION

Distinct34
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2000.7164
Minimum1984
Maximum2017
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size281.0 KiB
2022-11-01T09:31:21.932367image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1984
5-th percentile1985
Q11991
median2001
Q32010
95-th percentile2016
Maximum2017
Range33
Interquartile range (IQR)19

Descriptive statistics

Standard deviation10.08528955
Coefficient of variation (CV)0.005040839147
Kurtosis-1.309415619
Mean2000.7164
Median Absolute Deviation (MAD)9
Skewness-0.05659508101
Sum71929756
Variance101.7130653
MonotonicityNot monotonic
2022-11-01T09:31:21.984028image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=34)
ValueCountFrequency (%)
19851581
 
4.4%
20151264
 
3.5%
20161228
 
3.4%
20141211
 
3.4%
19871198
 
3.3%
19861188
 
3.3%
20081184
 
3.3%
20091183
 
3.3%
20131168
 
3.2%
20051156
 
3.2%
Other values (24)23591
65.6%
ValueCountFrequency (%)
1984645
1.8%
19851581
4.4%
19861188
3.3%
19871198
3.3%
19881119
3.1%
19891127
3.1%
19901068
3.0%
19911122
3.1%
19921107
3.1%
19931077
3.0%
ValueCountFrequency (%)
2017857
2.4%
20161228
3.4%
20151264
3.5%
20141211
3.4%
20131168
3.2%
20121139
3.2%
20111124
3.1%
20101108
3.1%
20091183
3.3%
20081184
3.3%

Engine Displacement
Real number (ℝ≥0)

HIGH CORRELATION

Distinct65
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.338492991
Minimum0.6
Maximum8.4
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size281.0 KiB
2022-11-01T09:31:22.044638image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.6
5-th percentile1.6
Q12.2
median3
Q34.3
95-th percentile5.9
Maximum8.4
Range7.8
Interquartile range (IQR)2.1

Descriptive statistics

Standard deviation1.359395386
Coefficient of variation (CV)0.4071883302
Kurtosis-0.5744958571
Mean3.338492991
Median Absolute Deviation (MAD)1
Skewness0.6053760668
Sum120025.5
Variance1.847955816
MonotonicityNot monotonic
2022-11-01T09:31:22.103717image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
23342
 
9.3%
32869
 
8.0%
2.52286
 
6.4%
2.41877
 
5.2%
3.51452
 
4.0%
1.81439
 
4.0%
1.61343
 
3.7%
51329
 
3.7%
4.31319
 
3.7%
2.21237
 
3.4%
Other values (55)17459
48.6%
ValueCountFrequency (%)
0.63
 
< 0.1%
0.94
 
< 0.1%
1162
 
0.5%
1.18
 
< 0.1%
1.238
 
0.1%
1.3178
 
0.5%
1.4188
 
0.5%
1.5667
1.9%
1.61343
3.7%
1.746
 
0.1%
ValueCountFrequency (%)
8.411
 
< 0.1%
8.39
 
< 0.1%
823
 
0.1%
7.44
 
< 0.1%
710
 
< 0.1%
6.8126
0.4%
6.769
0.2%
6.620
 
0.1%
6.5111
0.3%
6.431
 
0.1%

Cylinders
Real number (ℝ≥0)

HIGH CORRELATION

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.765075656
Minimum2
Maximum16
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size281.0 KiB
2022-11-01T09:31:22.154209image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile4
Q14
median6
Q36
95-th percentile8
Maximum16
Range14
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.755268232
Coefficient of variation (CV)0.3044657757
Kurtosis0.9226341968
Mean5.765075656
Median Absolute Deviation (MAD)2
Skewness0.8462732618
Sum207266
Variance3.080966565
MonotonicityNot monotonic
2022-11-01T09:31:22.193297image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
413494
37.5%
612765
35.5%
87998
22.2%
5723
 
2.0%
12562
 
1.6%
3201
 
0.6%
10153
 
0.4%
248
 
0.1%
168
 
< 0.1%
ValueCountFrequency (%)
248
 
0.1%
3201
 
0.6%
413494
37.5%
5723
 
2.0%
612765
35.5%
87998
22.2%
10153
 
0.4%
12562
 
1.6%
168
 
< 0.1%
ValueCountFrequency (%)
168
 
< 0.1%
12562
 
1.6%
10153
 
0.4%
87998
22.2%
612765
35.5%
5723
 
2.0%
413494
37.5%
3201
 
0.6%
248
 
0.1%

Transmission
Categorical

HIGH CORRELATION

Distinct45
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
Automatic 4-spd
10585 
Manual 5-spd
7787 
Automatic (S6)
2631 
Automatic 3-spd
2597 
Manual 6-spd
2423 
Other values (40)
9929 

Length

Max length32
Median length17
Mean length14.09454272
Min length8

Characters and Unicode

Total characters506727
Distinct characters34
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st rowAutomatic 3-spd
2nd rowAutomatic 3-spd
3rd rowAutomatic 3-spd
4th rowAutomatic 3-spd
5th rowAutomatic 4-spd

Common Values

ValueCountFrequency (%)
Automatic 4-spd10585
29.4%
Manual 5-spd7787
21.7%
Automatic (S6)2631
 
7.3%
Automatic 3-spd2597
 
7.2%
Manual 6-spd2423
 
6.7%
Automatic 5-spd2171
 
6.0%
Automatic 6-spd1432
 
4.0%
Manual 4-spd1306
 
3.6%
Automatic (S8)960
 
2.7%
Automatic (S5)822
 
2.3%
Other values (35)3238
 
9.0%

Length

2022-11-01T09:31:22.242009image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
automatic23408
32.3%
4-spd11891
16.4%
manual11659
16.1%
5-spd9958
13.8%
6-spd3855
 
5.3%
3-spd2671
 
3.7%
s62631
 
3.6%
s8960
 
1.3%
s5822
 
1.1%
7-spd730
 
1.0%
Other values (34)3789
 
5.2%

Most occurring characters

ValueCountFrequency (%)
a49432
 
9.8%
t48373
 
9.5%
36422
 
7.2%
u35952
 
7.1%
s30115
 
5.9%
-30038
 
5.9%
d29440
 
5.8%
p29440
 
5.8%
A25191
 
5.0%
o24965
 
4.9%
Other values (24)167359
33.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter348690
68.8%
Uppercase Letter43282
 
8.5%
Space Separator36422
 
7.2%
Decimal Number35271
 
7.0%
Dash Punctuation30038
 
5.9%
Open Punctuation6512
 
1.3%
Close Punctuation6512
 
1.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a49432
14.2%
t48373
13.9%
u35952
10.3%
s30115
8.6%
d29440
8.4%
p29440
8.4%
o24965
7.2%
i24758
7.1%
c23408
6.7%
m23408
6.7%
Other values (7)29399
8.4%
Decimal Number
ValueCountFrequency (%)
412122
34.4%
510795
30.6%
66849
19.4%
32673
 
7.6%
71470
 
4.2%
81241
 
3.5%
9117
 
0.3%
14
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
A25191
58.2%
M12307
28.4%
S5529
 
12.8%
V251
 
0.6%
L4
 
< 0.1%
Space Separator
ValueCountFrequency (%)
36422
100.0%
Dash Punctuation
ValueCountFrequency (%)
-30038
100.0%
Open Punctuation
ValueCountFrequency (%)
(6512
100.0%
Close Punctuation
ValueCountFrequency (%)
)6512
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin391972
77.4%
Common114755
 
22.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a49432
12.6%
t48373
12.3%
u35952
9.2%
s30115
7.7%
d29440
7.5%
p29440
7.5%
A25191
 
6.4%
o24965
 
6.4%
i24758
 
6.3%
c23408
 
6.0%
Other values (12)70898
18.1%
Common
ValueCountFrequency (%)
36422
31.7%
-30038
26.2%
412122
 
10.6%
510795
 
9.4%
66849
 
6.0%
(6512
 
5.7%
)6512
 
5.7%
32673
 
2.3%
71470
 
1.3%
81241
 
1.1%
Other values (2)121
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII506727
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a49432
 
9.8%
t48373
 
9.5%
36422
 
7.2%
u35952
 
7.1%
s30115
 
5.9%
-30038
 
5.9%
d29440
 
5.8%
p29440
 
5.8%
A25191
 
5.0%
o24965
 
4.9%
Other values (24)167359
33.0%

Drivetrain
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
Front-Wheel Drive
13044 
Rear-Wheel Drive
12726 
4-Wheel or All-Wheel Drive
6503 
All-Wheel Drive
2039 
4-Wheel Drive
 
1058
Other values (3)
 
582

Length

Max length26
Median length23
Mean length18.02219626
Min length13

Characters and Unicode

Total characters647934
Distinct characters22
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row2-Wheel Drive
2nd row2-Wheel Drive
3rd rowRear-Wheel Drive
4th rowRear-Wheel Drive
5th rowRear-Wheel Drive

Common Values

ValueCountFrequency (%)
Front-Wheel Drive13044
36.3%
Rear-Wheel Drive12726
35.4%
4-Wheel or All-Wheel Drive6503
18.1%
All-Wheel Drive2039
 
5.7%
4-Wheel Drive1058
 
2.9%
2-Wheel Drive423
 
1.2%
Part-time 4-Wheel Drive158
 
0.4%
2-Wheel Drive, Front1
 
< 0.1%

Length

2022-11-01T09:31:22.289583image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-01T09:31:22.352837image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
drive35952
42.3%
front-wheel13044
 
15.3%
rear-wheel12726
 
15.0%
all-wheel8542
 
10.0%
4-wheel7719
 
9.1%
or6503
 
7.6%
2-wheel424
 
0.5%
part-time158
 
0.2%
front1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e133746
20.6%
r68384
10.6%
l59539
9.2%
49117
 
7.6%
-42613
 
6.6%
W42455
 
6.6%
h42455
 
6.6%
i36110
 
5.6%
D35952
 
5.5%
v35952
 
5.5%
Other values (12)101611
15.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter435182
67.2%
Uppercase Letter112878
 
17.4%
Space Separator49117
 
7.6%
Dash Punctuation42613
 
6.6%
Decimal Number8143
 
1.3%
Other Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e133746
30.7%
r68384
15.7%
l59539
13.7%
h42455
 
9.8%
i36110
 
8.3%
v35952
 
8.3%
o19548
 
4.5%
t13361
 
3.1%
n13045
 
3.0%
a12884
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
W42455
37.6%
D35952
31.9%
F13045
 
11.6%
R12726
 
11.3%
A8542
 
7.6%
P158
 
0.1%
Decimal Number
ValueCountFrequency (%)
47719
94.8%
2424
 
5.2%
Space Separator
ValueCountFrequency (%)
49117
100.0%
Dash Punctuation
ValueCountFrequency (%)
-42613
100.0%
Other Punctuation
ValueCountFrequency (%)
,1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin548060
84.6%
Common99874
 
15.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e133746
24.4%
r68384
12.5%
l59539
10.9%
W42455
 
7.7%
h42455
 
7.7%
i36110
 
6.6%
D35952
 
6.6%
v35952
 
6.6%
o19548
 
3.6%
t13361
 
2.4%
Other values (7)60558
11.0%
Common
ValueCountFrequency (%)
49117
49.2%
-42613
42.7%
47719
 
7.7%
2424
 
0.4%
,1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII647934
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e133746
20.6%
r68384
10.6%
l59539
9.2%
49117
 
7.6%
-42613
 
6.6%
W42455
 
6.6%
h42455
 
6.6%
i36110
 
5.6%
D35952
 
5.5%
v35952
 
5.5%
Other values (12)101611
15.7%

Vehicle Class
Categorical

HIGH CORRELATION

Distinct34
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
Compact Cars
5185 
Subcompact Cars
4374 
Midsize Cars
4063 
Standard Pickup Trucks
2311 
Sport Utility Vehicle - 4WD
2081 
Other values (29)
17938 

Length

Max length34
Median length28
Mean length17.86056409
Min length4

Characters and Unicode

Total characters642123
Distinct characters38
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowSpecial Purpose Vehicle 2WD
2nd rowSpecial Purpose Vehicle 2WD
3rd rowSpecial Purpose Vehicle 2WD
4th rowSpecial Purpose Vehicle 2WD
5th rowMidsize Cars

Common Values

ValueCountFrequency (%)
Compact Cars5185
14.4%
Subcompact Cars4374
12.2%
Midsize Cars4063
11.3%
Standard Pickup Trucks2311
 
6.4%
Sport Utility Vehicle - 4WD2081
 
5.8%
Two Seaters1784
 
5.0%
Large Cars1742
 
4.8%
Sport Utility Vehicle - 2WD1615
 
4.5%
Special Purpose Vehicles1404
 
3.9%
Small Station Wagons1371
 
3.8%
Other values (24)10022
27.9%

Length

2022-11-01T09:31:22.414900image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
cars16576
16.4%
vehicle5990
 
5.9%
pickup5540
 
5.5%
trucks5536
 
5.5%
compact5185
 
5.1%
sport5152
 
5.1%
utility5152
 
5.1%
standard5003
 
4.9%
2wd4622
 
4.6%
midsize4479
 
4.4%
Other values (22)37859
37.4%

Most occurring characters

ValueCountFrequency (%)
65142
 
10.1%
a54931
 
8.6%
i41312
 
6.4%
r39411
 
6.1%
t37854
 
5.9%
s37548
 
5.8%
c37081
 
5.8%
e31682
 
4.9%
p26688
 
4.2%
o25225
 
3.9%
Other values (28)245249
38.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter455736
71.0%
Uppercase Letter106721
 
16.6%
Space Separator65142
 
10.1%
Decimal Number9076
 
1.4%
Dash Punctuation4707
 
0.7%
Other Punctuation741
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a54931
12.1%
i41312
9.1%
r39411
 
8.6%
t37854
 
8.3%
s37548
 
8.2%
c37081
 
8.1%
e31682
 
7.0%
p26688
 
5.9%
o25225
 
5.5%
l21524
 
4.7%
Other values (12)102480
22.5%
Uppercase Letter
ValueCountFrequency (%)
S24343
22.8%
C22193
20.8%
W11488
10.8%
V9254
 
8.7%
D9068
 
8.5%
P8089
 
7.6%
T8057
 
7.5%
M6702
 
6.3%
U5152
 
4.8%
L2375
 
2.2%
Decimal Number
ValueCountFrequency (%)
24628
51.0%
44448
49.0%
Other Punctuation
ValueCountFrequency (%)
,733
98.9%
/8
 
1.1%
Space Separator
ValueCountFrequency (%)
65142
100.0%
Dash Punctuation
ValueCountFrequency (%)
-4707
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin562457
87.6%
Common79666
 
12.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a54931
 
9.8%
i41312
 
7.3%
r39411
 
7.0%
t37854
 
6.7%
s37548
 
6.7%
c37081
 
6.6%
e31682
 
5.6%
p26688
 
4.7%
o25225
 
4.5%
S24343
 
4.3%
Other values (22)206382
36.7%
Common
ValueCountFrequency (%)
65142
81.8%
-4707
 
5.9%
24628
 
5.8%
44448
 
5.6%
,733
 
0.9%
/8
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII642123
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
65142
 
10.1%
a54931
 
8.6%
i41312
 
6.4%
r39411
 
6.1%
t37854
 
5.9%
s37548
 
5.8%
c37081
 
5.8%
e31682
 
4.9%
p26688
 
4.2%
o25225
 
3.9%
Other values (28)245249
38.2%

Fuel Type
Categorical

HIGH CORRELATION

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
Regular
23587 
Premium
9921 
Gasoline or E85
 
1195
Diesel
 
911
Premium or E85
 
121
Other values (8)
 
217

Length

Max length27
Median length7
Mean length7.298926346
Min length3

Characters and Unicode

Total characters262411
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRegular
2nd rowRegular
3rd rowRegular
4th rowRegular
5th rowPremium

Common Values

ValueCountFrequency (%)
Regular23587
65.6%
Premium9921
27.6%
Gasoline or E851195
 
3.3%
Diesel911
 
2.5%
Premium or E85121
 
0.3%
Midgrade74
 
0.2%
CNG60
 
0.2%
Premium and Electricity20
 
0.1%
Gasoline or natural gas20
 
0.1%
Premium Gas or Electricity17
 
< 0.1%
Other values (3)26
 
0.1%

Length

2022-11-01T09:31:22.466090image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
regular23605
60.8%
premium10079
26.0%
or1363
 
3.5%
e851316
 
3.4%
gasoline1223
 
3.2%
diesel911
 
2.3%
midgrade74
 
0.2%
cng60
 
0.2%
electricity55
 
0.1%
gas55
 
0.1%
Other values (3)64
 
0.2%

Most occurring characters

ValueCountFrequency (%)
e36866
14.0%
r35204
13.4%
u33704
12.8%
l25814
9.8%
a25041
9.5%
g23699
9.0%
R23605
9.0%
m20158
7.7%
i12397
 
4.7%
P10079
 
3.8%
Other values (17)15844
6.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter219448
83.6%
Uppercase Letter37478
 
14.3%
Space Separator2853
 
1.1%
Decimal Number2632
 
1.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e36866
16.8%
r35204
16.0%
u33704
15.4%
l25814
11.8%
a25041
11.4%
g23699
10.8%
m20158
9.2%
i12397
 
5.6%
o2594
 
1.2%
s2189
 
1.0%
Other values (6)1782
 
0.8%
Uppercase Letter
ValueCountFrequency (%)
R23605
63.0%
P10079
26.9%
E1371
 
3.7%
G1318
 
3.5%
D911
 
2.4%
M74
 
0.2%
C60
 
0.2%
N60
 
0.2%
Decimal Number
ValueCountFrequency (%)
51316
50.0%
81316
50.0%
Space Separator
ValueCountFrequency (%)
2853
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin256926
97.9%
Common5485
 
2.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e36866
14.3%
r35204
13.7%
u33704
13.1%
l25814
10.0%
a25041
9.7%
g23699
9.2%
R23605
9.2%
m20158
7.8%
i12397
 
4.8%
P10079
 
3.9%
Other values (14)10359
 
4.0%
Common
ValueCountFrequency (%)
2853
52.0%
51316
24.0%
81316
24.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII262411
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e36866
14.0%
r35204
13.4%
u33704
12.8%
l25814
9.8%
a25041
9.5%
g23699
9.0%
R23605
9.0%
m20158
7.7%
i12397
 
4.7%
P10079
 
3.8%
Other values (17)15844
6.0%

Fuel Barrels/Year
Real number (ℝ≥0)

HIGH CORRELATION

Distinct123
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.6090555
Minimum0.06
Maximum47.08714286
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size281.0 KiB
2022-11-01T09:31:22.518848image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.06
5-th percentile11.36586207
Q114.69942308
median17.34789474
Q320.600625
95-th percentile25.35461538
Maximum47.08714286
Range47.02714286
Interquartile range (IQR)5.901201923

Descriptive statistics

Standard deviation4.467282686
Coefficient of variation (CV)0.2536923508
Kurtosis1.468284879
Mean17.6090555
Median Absolute Deviation (MAD)3.017025172
Skewness0.6382712089
Sum633080.7634
Variance19.9566146
MonotonicityNot monotonic
2022-11-01T09:31:22.745631image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18.311666673492
 
9.7%
17.347894743203
 
8.9%
15.695714293120
 
8.7%
16.48053061
 
8.5%
19.388823532277
 
6.3%
20.6006252226
 
6.2%
14.982272732213
 
6.2%
21.9742210
 
6.1%
23.543571431940
 
5.4%
14.330869571762
 
4.9%
Other values (113)10448
29.1%
ValueCountFrequency (%)
0.064
 
< 0.1%
0.0664285714311
< 0.1%
0.068888888893
 
< 0.1%
0.080869565221
 
< 0.1%
0.084545454552
 
< 0.1%
0.10333333332
 
< 0.1%
0.10941176473
 
< 0.1%
0.116253
 
< 0.1%
0.1241
 
< 0.1%
0.13285714292
 
< 0.1%
ValueCountFrequency (%)
47.087142865
 
< 0.1%
41.2012521
 
0.1%
36.6233333336
 
0.1%
32.961130
 
0.4%
29.96454545439
 
1.2%
27.4675730
2.0%
27.298928575
 
< 0.1%
25.47930
 
0.1%
25.354615381245
3.5%
23.886562586
 
0.2%

City MPG
Real number (ℝ≥0)

HIGH CORRELATION

Distinct48
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.6461393
Minimum6
Maximum58
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size281.0 KiB
2022-11-01T09:31:22.805434image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum6
5-th percentile11
Q115
median17
Q320
95-th percentile26
Maximum58
Range52
Interquartile range (IQR)5

Descriptive statistics

Standard deviation4.769348814
Coefficient of variation (CV)0.2702771826
Kurtosis5.087694024
Mean17.6461393
Median Absolute Deviation (MAD)3
Skewness1.468995384
Sum634414
Variance22.74668811
MonotonicityNot monotonic
2022-11-01T09:31:22.865264image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=48)
ValueCountFrequency (%)
154158
11.6%
173690
10.3%
163577
9.9%
183547
9.9%
142760
 
7.7%
192629
 
7.3%
211993
 
5.5%
131959
 
5.4%
201905
 
5.3%
121512
 
4.2%
Other values (38)8222
22.9%
ValueCountFrequency (%)
65
 
< 0.1%
720
 
0.1%
884
 
0.2%
9194
 
0.5%
10516
 
1.4%
111422
 
4.0%
121512
 
4.2%
131959
5.4%
142760
7.7%
154158
11.6%
ValueCountFrequency (%)
582
 
< 0.1%
542
 
< 0.1%
535
 
< 0.1%
5110
< 0.1%
502
 
< 0.1%
493
 
< 0.1%
4812
< 0.1%
472
 
< 0.1%
456
 
< 0.1%
4416
< 0.1%

Highway MPG
Real number (ℝ≥0)

HIGH CORRELATION

Distinct49
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.88064642
Minimum9
Maximum61
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size281.0 KiB
2022-11-01T09:31:22.923054image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum9
5-th percentile15
Q120
median24
Q327
95-th percentile34
Maximum61
Range52
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.890876356
Coefficient of variation (CV)0.2466799371
Kurtosis0.934351372
Mean23.88064642
Median Absolute Deviation (MAD)4
Skewness0.6175942148
Sum858557
Variance34.70242424
MonotonicityNot monotonic
2022-11-01T09:31:22.982329image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
242953
 
8.2%
262871
 
8.0%
222723
 
7.6%
232545
 
7.1%
202351
 
6.5%
252104
 
5.9%
172098
 
5.8%
271718
 
4.8%
281678
 
4.7%
191668
 
4.6%
Other values (39)13243
36.8%
ValueCountFrequency (%)
910
 
< 0.1%
1060
 
0.2%
1159
 
0.2%
12255
 
0.7%
13280
 
0.8%
14443
 
1.2%
15842
2.3%
161281
3.6%
172098
5.8%
181570
4.4%
ValueCountFrequency (%)
611
 
< 0.1%
601
 
< 0.1%
592
 
< 0.1%
583
 
< 0.1%
533
 
< 0.1%
526
< 0.1%
514
 
< 0.1%
504
 
< 0.1%
4914
< 0.1%
4811
< 0.1%

Combined MPG
Real number (ℝ≥0)

HIGH CORRELATION

Distinct46
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19.92932243
Minimum7
Maximum56
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size281.0 KiB
2022-11-01T09:31:23.038267image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile13
Q116
median19
Q323
95-th percentile29
Maximum56
Range49
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.112408945
Coefficient of variation (CV)0.2565269824
Kurtosis2.722551335
Mean19.92932243
Median Absolute Deviation (MAD)3
Skewness1.067772702
Sum716499
Variance26.13672522
MonotonicityNot monotonic
2022-11-01T09:31:23.096408image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
183559
 
9.9%
193282
 
9.1%
213158
 
8.8%
203093
 
8.6%
172420
 
6.7%
162315
 
6.4%
222247
 
6.2%
152241
 
6.2%
141947
 
5.4%
231794
 
5.0%
Other values (36)9896
27.5%
ValueCountFrequency (%)
75
 
< 0.1%
821
 
0.1%
940
 
0.1%
10131
 
0.4%
11441
 
1.2%
12741
 
2.1%
131255
3.5%
141947
5.4%
152241
6.2%
162315
6.4%
ValueCountFrequency (%)
562
 
< 0.1%
534
 
< 0.1%
525
 
< 0.1%
5015
< 0.1%
482
 
< 0.1%
4716
< 0.1%
4611
< 0.1%
455
 
< 0.1%
445
 
< 0.1%
437
< 0.1%

CO2 Emission Grams/Mile
Real number (ℝ≥0)

HIGH CORRELATION

Distinct575
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean475.3163393
Minimum37
Maximum1269.571429
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size281.0 KiB
2022-11-01T09:31:23.154595image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum37
5-th percentile306.4482759
Q1395
median467.7368421
Q3555.4375
95-th percentile683.6153846
Maximum1269.571429
Range1232.571429
Interquartile range (IQR)160.4375

Descriptive statistics

Standard deviation119.0607732
Coefficient of variation (CV)0.2504874405
Kurtosis1.263582798
Mean475.3163393
Median Absolute Deviation (MAD)81.34553776
Skewness0.7416918392
Sum17088573.03
Variance14175.46772
MonotonicityNot monotonic
2022-11-01T09:31:23.216399image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
493.72222223114
 
8.7%
467.73684212763
 
7.7%
444.352697
 
7.5%
423.19047622662
 
7.4%
592.46666672023
 
5.6%
555.43752022
 
5.6%
522.76470592010
 
5.6%
403.95454551819
 
5.1%
634.78571431801
 
5.0%
386.39130431367
 
3.8%
Other values (565)13674
38.0%
ValueCountFrequency (%)
371
 
< 0.1%
402
< 0.1%
512
< 0.1%
813
< 0.1%
841
 
< 0.1%
871
 
< 0.1%
913
< 0.1%
1011
 
< 0.1%
1042
< 0.1%
1121
 
< 0.1%
ValueCountFrequency (%)
1269.5714295
 
< 0.1%
1110.87521
 
0.1%
987.444444436
 
0.1%
888.7127
 
0.4%
8473
 
< 0.1%
807.9090909434
1.2%
8053
 
< 0.1%
8011
 
< 0.1%
7901
 
< 0.1%
787.7254
 
< 0.1%

Fuel Cost/Year
Real number (ℝ≥0)

HIGH CORRELATION

Distinct55
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1892.598465
Minimum600
Maximum5800
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size281.0 KiB
2022-11-01T09:31:23.277158image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum600
5-th percentile1200
Q11500
median1850
Q32200
95-th percentile2800
Maximum5800
Range5200
Interquartile range (IQR)700

Descriptive statistics

Standard deviation506.9586274
Coefficient of variation (CV)0.2678638057
Kurtosis1.74029905
Mean1892.598465
Median Absolute Deviation (MAD)350
Skewness0.8345314918
Sum68042700
Variance257007.0499
MonotonicityNot monotonic
2022-11-01T09:31:23.338318image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18502849
 
7.9%
19502818
 
7.8%
16002297
 
6.4%
17502271
 
6.3%
24002086
 
5.8%
16501935
 
5.4%
22001765
 
4.9%
15001706
 
4.7%
21001607
 
4.5%
25501512
 
4.2%
Other values (45)15106
42.0%
ValueCountFrequency (%)
6002
 
< 0.1%
65024
 
0.1%
70029
 
0.1%
75013
 
< 0.1%
80075
 
0.2%
85077
 
0.2%
90084
 
0.2%
95066
 
0.2%
1000248
0.7%
1050420
1.2%
ValueCountFrequency (%)
58005
 
< 0.1%
50504
 
< 0.1%
45006
 
< 0.1%
415017
 
< 0.1%
405053
 
0.1%
3700176
0.5%
3400232
0.6%
335081
 
0.2%
3100338
0.9%
3050294
0.8%

Interactions

2022-11-01T09:31:20.956345image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:16.680694image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.205659image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.850511image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.366193image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.856241image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.351359image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.958312image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.446408image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:21.012517image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:16.745573image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.265772image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.909272image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.422245image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.913229image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.406866image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.014005image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.504629image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:21.069391image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:16.807301image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.338747image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.972594image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.479552image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.971178image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.464649image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.069613image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.563738image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:21.127622image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:16.866893image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.514215image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.033612image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.535997image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.027829image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.521916image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.126076image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.621276image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:21.180809image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:16.923339image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.570191image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.087496image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.587969image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.079979image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.576499image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.178207image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.675466image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:21.235664image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:16.980534image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.625954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.143556image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.640978image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.134011image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.630984image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.231588image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.731391image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:21.290638image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.037420image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.681866image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.198793image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.695076image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.188277image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.685370image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.284591image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.787984image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:21.343683image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.091939image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.736616image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.253302image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.746816image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.240664image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.737054image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.336639image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.842768image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:21.399221image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.149212image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:17.795013image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.310226image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:18.802624image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.297294image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:19.903199image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.391836image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-01T09:31:20.900301image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-11-01T09:31:23.393148image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-11-01T09:31:23.468759image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-01T09:31:23.537319image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-01T09:31:23.606743image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-01T09:31:23.671885image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-01T09:31:23.725722image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-01T09:31:21.522546image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-01T09:31:21.676381image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

MakeModelYearEngine DisplacementCylindersTransmissionDrivetrainVehicle ClassFuel TypeFuel Barrels/YearCity MPGHighway MPGCombined MPGCO2 Emission Grams/MileFuel Cost/Year
0AM GeneralDJ Po Vehicle 2WD19842.54.0Automatic 3-spd2-Wheel DriveSpecial Purpose Vehicle 2WDRegular19.388824181717522.7647061950
1AM GeneralFJ8c Post Office19844.26.0Automatic 3-spd2-Wheel DriveSpecial Purpose Vehicle 2WDRegular25.354615131313683.6153852550
2AM GeneralPost Office DJ5 2WD19852.54.0Automatic 3-spdRear-Wheel DriveSpecial Purpose Vehicle 2WDRegular20.600625161716555.4375002100
3AM GeneralPost Office DJ8 2WD19854.26.0Automatic 3-spdRear-Wheel DriveSpecial Purpose Vehicle 2WDRegular25.354615131313683.6153852550
4ASC IncorporatedGNX19873.86.0Automatic 4-spdRear-Wheel DriveMidsize CarsPremium20.600625142116555.4375002550
5Acura2.2CL/3.0CL19972.24.0Automatic 4-spdFront-Wheel DriveSubcompact CarsRegular14.982273202622403.9545451500
6Acura2.2CL/3.0CL19972.24.0Manual 5-spdFront-Wheel DriveSubcompact CarsRegular13.733750222824370.2916671400
7Acura2.2CL/3.0CL19973.06.0Automatic 4-spdFront-Wheel DriveSubcompact CarsRegular16.480500182620444.3500001650
8Acura2.3CL/3.0CL19982.34.0Automatic 4-spdFront-Wheel DriveSubcompact CarsRegular14.982273192722403.9545451500
9Acura2.3CL/3.0CL19982.34.0Manual 5-spdFront-Wheel DriveSubcompact CarsRegular13.733750212924370.2916671400

Last rows

MakeModelYearEngine DisplacementCylindersTransmissionDrivetrainVehicle ClassFuel TypeFuel Barrels/YearCity MPGHighway MPGCombined MPGCO2 Emission Grams/MileFuel Cost/Year
35942smartfortwo coupe20081.03.0Automatic (S5)Rear-Wheel DriveTwo SeatersPremium9.155833334136246.8611111100
35943smartfortwo coupe20091.03.0Automatic (AM5)Rear-Wheel DriveTwo SeatersPremium9.155833334136246.8611111100
35944smartfortwo coupe20101.03.0Auto(AM5)Rear-Wheel DriveTwo SeatersPremium9.155833334136246.8611111100
35945smartfortwo coupe20111.03.0Auto(AM5)Rear-Wheel DriveTwo SeatersPremium9.155833334136246.8611111100
35946smartfortwo coupe20121.03.0Auto(AM5)Rear-Wheel DriveTwo SeatersPremium9.155833343836246.8611111100
35947smartfortwo coupe20131.03.0Auto(AM5)Rear-Wheel DriveTwo SeatersPremium9.155833343836244.0000001100
35948smartfortwo coupe20141.03.0Auto(AM5)Rear-Wheel DriveTwo SeatersPremium9.155833343836243.0000001100
35949smartfortwo coupe20151.03.0Auto(AM5)Rear-Wheel DriveTwo SeatersPremium9.155833343836244.0000001100
35950smartfortwo coupe20160.93.0Auto(AM6)Rear-Wheel DriveTwo SeatersPremium9.155833343936246.0000001100
35951smartfortwo coupe20160.93.0Manual 5-spdRear-Wheel DriveTwo SeatersPremium9.417429323935255.0000001150